AITopics

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Neural Information Processing SystemsFeb-16-2026, 12:07:56 GMT

ac112e8ffc4e5b9ece32070440a8ca43-Paper-Conference.pdf

information retrieval, machine learning, natural language, (20 more...)

Country:

Asia > China (0.05)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > Singapore (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Neural Information Processing SystemsFeb-16-2026, 07:50:42 GMT

853e781cb2af58956ed5c89aa59da3fc-Paper-Conference.pdf

information retrieval, machine learning, natural language, (17 more...)

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Myanmar > Tanintharyi Region > Dawei (0.04)
Asia > China (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Genre: Research Report > Experimental Study (1.00)

Industry:

Education (0.67)
Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Neural Information Processing SystemsFeb-11-2026, 03:36:50 GMT

c88d8d0a6097754525e02c2246d8d27f-Supplemental.pdf

imagenet-1k, relevant sample, validation, (17 more...)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsDec-25-2025, 00:37:13 GMT

End-to-End Training of Multi-Document Reader and Retriever for Open-Domain Question Answering

We present an end-to-end differentiable training method for retrieval-augmented open-domain question answering systems that combine information from multiple retrieved documents when generating answers.

end-to-end training, multi-document reader and retriever, name change, (4 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.80)

Ryu, Hyunseok, Shin, Wonjune, Park, Hyun

SHRAG: AFrameworkfor Combining Human-Inspired Search with RAG

arXiv.org Artificial IntelligenceDec-2-2025

Retrieval-Augmented Generation (RAG) is gaining recognition as one of the key technological axes for next generation information retrieval, owing to its ability to mitigate the hallucination phenomenon in Large Language Models (LLMs)and effectively incorporate up-to-date information. However, specialized expertise is necessary to construct ahigh-quality retrieval system independently; moreover, RAGdemonstratesrelativelyslowerprocessing speeds compared to conventional pure retrieval systems because it involves both retrieval and generation stages. Accordingly, this study proposes SHRAG, a novel framework designed to facilitate the seamless integration of Information Retrieval and RAG while simultaneously securing precise retrieval performance. SHRAG utilizes a Large Language Model as a Query Strategist to automatically transform unstructured natural language queries into logically structured search queries, subsequently performing Boolean retrieval to emulate the search process of an expert human searcher. Furthermore, it incorporates multilingual query expansion and a multilingual embedding model, enabling it to perform efficient cross-lingual question answering within the multilingual dataset environment of the ScienceON Challenge. Experimental results demonstrate that the proposed method, combining logical retrieval capabilities and generative reasoning, can significantly enhance the accuracy and reliability of RAG systems. Furthermore, SHRAG movesbeyondconventionaldocument-centric retrieval methods, presenting the potential for a new search paradigm capable of providing direct and reliable responses to queries.

large language model, machine learning, natural language, (18 more...)

2512.00772

Country:

North America > United States (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education > Educational Setting > Higher Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (0.48)

Jarolím, Antonín, Fajčík, Martin, Makaiová, Lucia

Can LLMs extract human-like fine-grained evidence for evidence-based fact-checking?

arXiv.org Artificial IntelligenceNov-27-2025

Misinformation frequently spreads in user comments under online news articles, highlighting the need for effective methods to detect factually incorrect information. To strongly support or refute claims extracted from such comments, it is necessary to identify relevant documents and pinpoint the exact text spans that justify or contradict each claim. This paper focuses on the latter task -- fine-grained evidence extraction for Czech and Slovak claims. We create new dataset, containing two-way annotated fine-grained evidence created by paid annotators. We evaluate large language models (LLMs) on this dataset to assess their alignment with human annotations. The results reveal that LLMs often fail to copy evidence verbatim from the source text, leading to invalid outputs. Error-rate analysis shows that the {llama3.1:8b model achieves a high proportion of correct outputs despite its relatively small size, while the gpt-oss-120b model underperforms despite having many more parameters. Furthermore, the models qwen3:14b, deepseek-r1:32b, and gpt-oss:20b demonstrate an effective balance between model size and alignment with human annotations.

large language model, machine learning, natural language, (16 more...)

2511.21401

Country: Europe > Czechia (0.14)

Genre: Research Report (0.65)

Industry: Media > News (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)

Macmillan-Scott, Olivia, Goworek, Roksana, Özyiğit, Eda B.

Generative Query Expansion with Multilingual LLMs for Cross-Lingual Information Retrieval

arXiv.org Artificial IntelligenceNov-25-2025

Query expansion is the reformulation of a user query by adding semantically related information, and is an essential component of monolingual and cross-lingual information retrieval used to ensure that relevant documents are not missed. Recently, multilingual large language models (mLLMs) have shifted query expansion from semantic augmentation with synonyms and related words to pseudo-document generation. Pseudo-documents both introduce additional relevant terms and bridge the gap between short queries and long documents, which is particularly beneficial in dense retrieval. This study evaluates recent mLLMs and fine-tuned variants across several generative expansion strategies to identify factors that drive cross-lingual retrieval performance. Results show that query length largely determines which prompting technique is effective, and that more elaborate prompts often do not yield further gains. Substantial linguistic disparities persist: cross-lingual query expansion can produce the largest improvements for languages with the weakest baselines, yet retrieval is especially poor between languages written in different scripts. Fine-tuning is found to lead to performance gains only when the training and test data are of similar format. These outcomes underline the need for more balanced multilingual and cross-lingual training and evaluation resources.

artificial intelligence, large language model, natural language, (16 more...)

2511.19325

Country:

Europe (1.00)
North America > United States (0.68)
Asia > Middle East (0.46)

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval > Query Processing (1.00)

Xu, Yifan, Gupta, Vipul, Aggarwal, Rohit, Mahadevan, Varsha, Krishnamachari, Bhaskar

Cluster-based Adaptive Retrieval: Dynamic Context Selection for RAG Applications

arXiv.org Artificial IntelligenceNov-20-2025

Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by pulling in external material, document, code, manuals, from vast and ever-growing corpora, to effectively answer user queries. The effectiveness of RAG depends significantly on aligning the number of retrieved documents with query characteristics: narrowly focused queries typically require fewer, highly relevant documents, whereas broader or ambiguous queries benefit from retrieving more extensive supporting information. However, the common static top-k retrieval approach fails to adapt to this variability, resulting in either insufficient context from too few documents or redundant information from too many. Motivated by these challenges, we introduce Cluster-based Adaptive Retrieval (CAR), an algorithm that dynamically determines the optimal number of documents by analyzing the clustering patterns of ordered query-document similarity distances. CAR detects the transition point within similarity distances, where tightly clustered, highly relevant documents shift toward less pertinent candidates, establishing an adaptive cut-off that scales with query complexity. On Coinbase's CDP corpus and the public MultiHop-RAG benchmark, CAR consistently picks the optimal retrieval depth and achieves the highest TES score, outperforming every fixed top-k baseline. In downstream RAG evaluations, CAR cuts LLM token usage by 60%, trims end-to-end latency by 22%, and reduces hallucinations by 10% while fully preserving answer relevance. Since integrating CAR into Coinbase's virtual assistant, we've seen user engagement jump by 200%.

large language model, machine learning, natural language, (14 more...)

2511.14769

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.82)

Industry: Banking & Finance > Trading (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)